Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
RL Algorithms, Q-Learning, Policy Gradients, OpenAI Gym
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
122294
posts in
1.79
s
Control Reinforcement Learning: Token-Level
Mechanistic
Analysis via Learned
SAE
Feature Steering
arxiv.org
路
10h
馃
AI
check out this
article
on Reinforcement Learning with R:
Origins
, Real-Life Applications, and Practical Implementation
dev.to
路
2d
路
Discuss:
DEV
馃
AI
General Flexible $f$-
divergence
for Challenging Offline RL Datasets with Low
Stochasticity
and Diverse Behavior Policies
arxiv.org
路
10h
馃
AI
A multi-agent reinforcement learning approach to autonomous aircraft
taxiing
with
taxiing
time, fuel consumption, and
emission
optimization
sciencedirect.com
路
1d
馃
AI
Show HN:
Fighting
the War Against
Expensive
Reinforcement Learning
cadenza-landing-qtu7gbjwb-akshparekh123-3457s-projects.vercel.app
路
8h
路
Discuss:
Hacker News
馃
AI
Recursive
self-improvement
from AI models
marginalrevolution.com
路
1d
路
Discuss:
Hacker News
馃
AI
Robotics
Motion Learning: Training Linked Robot Arms with
Kuramoto
Models
hackernoon.com
路
23h
馃
AI
A training
principle
for
drifting
models
breno.bearblog.dev
路
4h
馃
AI
Your AI Strategy Has a
Human-Shaped
Hole
superiortech.io
路
1h
路
Discuss:
Hacker News
馃
AI
ashworks1706/rlhf-from-scratch
: A theoretical and practical deep dive into Reinforcement Learning with Human Feedback and it鈥檚 applications in Large Language Models from scratch.
github.com
路
2d
路
Discuss:
Hacker News
馃
AI
A
masterclass
in AI security
operations
redcanary.com
路
1h
馃
AI
Feedback
Control for Computer Systems
janert.org
路
7h
馃
AI
The ODE ( O
verview
, D ata, and E
xecution
) protocol for a standardized use of machine learning in environmental,...
sciencedirect.com
路
4h
馃
AI
I
benchmarked
4 CLI coding agents on an
NP-hard
optimization problem I solved by hand 8 years ago. One of them beat me.
charlesazam.com
路
48m
路
Discuss:
Hacker News
馃
AI
What
concrete
mechanisms
could lead to AI models having open-ended goals?
lesswrong.com
路
1d
馃
AI
Embodied
machine learning: From research ideas to
classroom
activities
raspberrypi.org
路
1h
馃
AI
The 4 Mixture of Experts Architectures: How to Train
100B
Models at
10B
Cost
pub.towardsai.net
路
2h
馃
AI
YORU
: Animal behavior detection with object-based approach for real-time
closed-loop
feedback
science.org
路
1d
馃
AI
Architectural and Mathematical
Foundations
of Machine Learning: A
Rigorous
Synthesis of Theory, Geometry, and Implementation
chizkidd.github.io
路
1d
路
Discuss:
Hacker News
馃
AI
Learning Optimization Tools
trendhunter.com
路
2d
馃
AI
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help